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Abstract. The stratified proportional intensity model generalizes Cox's proportional intensity 
model by allowing different groups of the population under study to have distinct baseline 
intensity functions. In this article, we consider the problem of estimation in this model when 
the variable indicating the stratum is unobserved for some individuals in the studied sample. 
In this setting, we construct nonparametric maximum likelihood estimators for the parameters 
of the stratified model and we establish their consistency and asymptotic normality Consistent 
estimators for the limiting variances are also obtained. 
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1. Introduction 

This paper considers the problem of estimation in the stratified proportional intensity re- 
gression model for survival data, when the stratum information is missing for some sample 
individuals. 

The stratified proportional intensity mode l (see Andersen et al. (1993) or Martinussen and Scheikd 



( 2006f ) for example) generalizes the usual Cox ( 19721) proportional intensitv regression model 



for survival data, by allowing different groups -the strata- of the population under study 
to have distinct baseline intensity functions. More precisely, in the stratified model, the 
strata divide the sample individuals into K disjoint groups, each having a distinct baseline 
intensity function Afe but a common value for the regression parameter. 

The intensity function for the failure time T*' of an individual in stratum fc thus takes 
the form 

Afc(0exp(/3'X), (1) 

where X is a p-vector of covariates, /3 is a p-vector of unknown regression parameters of 
interest, and {\k{t) : t > 0, fc = 1, . . . . K} are K unknown baseline intensity functions. 

A consistent and asymptotically normal est imator of /3 can be obtained by maximiz- 
ing the partial likelihood function ((Col Il975l) . The partial likelihood for the stratified 
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model (HI) is the prod uct over strata of the within-stratum partial likelihoods (we refer to 



Andersen et al. for a detailed treatment of maximum partial likelihood estimation 



in model ([T])). In some applications, it can also be desirable to est imate the cumulative 
baseline intensity functions Afc = / A^. The s o-called iBreslowl (Il972l) estimators are com- 



monly used for that purpose (see chapter 7 of lAndersen et all (1993) for further details on 



the Breslow estimator and its asymptotic properties). 

One major motivation for using the stratified model is that it allows to accomodate in 
the analysis a predictive categorical covariate whose effect on the intensity is not propor- 
tional. To this end, the individuals under study are stratified with respect to the categories 
of this covariate. In many applications however, this covariate may be missing for some 
sample individuals (for example, histological stage determination may require biopsy and 
due to expensiveness, may not be performed on all the study subjects). In this case, the 
usual statistical inference for model ([1]), based on the product of within-stratum partial 
likelihoods, can not be directly applied. 

In this work, we consider the problem of estimating f3 and the A^,, fc = 1, . . . , A' in model 
([T|), when the covariate defining the stratum is missing for some (but not all) individu- 
als. Equivalently said, we consider the problem of estimating model ([1]) when the stratum 
information is only partially available. 

The problem of estimation in the (unstratified) Cox regression model X{t) ex-p{(3'X) with 
missing covari ate X has been the su bject o f intense research ove r the past decade: see fo r 
example iLin an d Ying (1993), Pail^ (|l997^ ■ IPaik and Tsail (fl997l ). IChen and Littld (|l999f l. 
Martinussenn|T999[ ). iPonsl (|2002D . and the references therein. But to the best of our knowl- 



edge and despite its practical relevance, the problem of statistical inference in model ((T]) with 
partially available stratum information has not been yet extensively investigated. Recently, 
Dupuv and Leconte ( 2008f) studied the asymptotic properties of a regression cahbration es- 



timator of P in this setting (re gression calibration is a general method for handling missing 
data in regression models, see ICarroll et al.l (1995) for example). The authors proved that 
this estimator is asymptotically biased, although nevertheless asymptotically normal. No 
estimators of the cumulative baseline intensity functions were provided. 

In this work, we aim at providing an estimator of /3 that is both consistent and asymp- 
totically normal. Moreover, although the cumulative intensity functions Afc are usually not 
the primary parameters of interest, we also aim at providing consistent and asymptotically 
normal estimators of the values Ak{t), k = 1, . . . , K. 

The regression calibration inferential procedure investigated bv lDupuv and Lecontd (|2008l) 
is essentially based on a modified version of the partial likelihood for model ((!]) . In this pa- 
per, we propose an alternative method which may be viewed as a fully maximum likelihood 
approach. Besides assuming that the failure intensity function for an individual in stratum 
k is given by model ([T]) , we assume that the probability of being in stratum k conditionally 
on a set of observed covariates W (which may include some components of X) is of the 
logistic form, depending on some unknown finite-dimensional parameter 7. 

A full likehhood for the collected parameter 6 = (/3, 7, A^; fc = 1, . . . ,K) is constructed 
from a sample of incompletely observed data. Based on this, we propose to estimate the 
finite and infinite-dimensional components of by using the nonparametric maximum likeli- 
hood (NPML) estimation method. We then provide asymptotic results for these estimators, 
including consistency, asymptotic normality, semiparametric efficiency of the NPML esti- 
mator of (3, and consistent variance estimation. 

Our proofs use some techniques developed bv lMurphvl (|l994L Il995[ ) and IParneij (|l998f ) 
to establish the asymptotic theory for the frailty model. 
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The paper is organized as follows. In Section [21 we describe in greater detail the data 
structure and the model assumptions. In Section [3l we describe the NPML estimation 
method for our setting and we establish existence of the NPML estimator of 9. Section [4] 
establishes the consistency and asymptotic normality of the proposed estimator. Consistent 
variance estimators are also obtained for both the finite-dimensional parameter estimators 
and the nonparametric cumulative baseline intensity estimators. We give some concluding 
remarks in Section [51 Proofs are given in Appendix. 



2. Data structure and model assumptions 

We describe the notations and model assumptions that will be used throughout the paper. 

All the random variables are defined on a probability space (ri,C,P). Let T° be a 
random failure time whose distribution depends on a vector of covariates A" G and 
on a stratum indicator S G K. = {1, . . . , K}. We assume that conditionally on X and 
S = k {k £ K.), the intensity function of r° is given by model ([T]). We suppose that T° 
may be right-censored by a positive random variable C and that the analysis is restricted 
to the time interval [0, r], where r < oo denotes the end of the study. Thus we actually 
observe the potentially censored duration T = min{T'^, min(C, r)} and a censoring indicator 
A = l{r" < min(C,r)}. If t e [0,r], we denote by A^(t) = 1{T < t} A and Y{t) = 1{T > t} 
the failure counting and at-risk processes respectively. 

Let W € be a vector of surrogate covariates for S {W and X may share some 
common components). That is, W brings a partial information about S when S is missing, 
and it adds no information when S is observed so that the distribution of conditionally 
on X, S, and W does not involve the components of W that are not in X. We assume 
that the conditional probability that an individual belongs to the k-th stratum given his 
covariate vector W follows a multinomial logistic model: 



E,=iexp(7jW^) 



where 7^ G M™ {k E K,). Finally, we let R denote the indicator variable which is 1 if 5 is 
observed and otherwise. Then, the data consist of n i.i.d. replicates 

^ {T,,A^,X,,W^,R^,R^Si), z = l,...,n, 

of O = (T, A, X, IV, _R, i?S'). The data available for the i-tli individual are therefore 
(T„A„X„iy„^,) if i?, = 1 and (T„A„X„Ty,) if i?, = 0. 

In the sequel, we set = for model identifiability purpose and we note 7 = 
iii, ■ ■ -.i'k^i)' e (R")^"^ = R''- We also note Tik,^{W) = P(5 = k\W), k e K.. Now, let 
6 — {(3,j,Ak;k e K.) be the collected parameter and 60 = (/3o, 7o, A^^o; ^ G ^) denote the 
true parameter value. Under the true value 0q, the expectation of random variables will be 
denoted Pg^ . P„ will denote the empirical probability measure. In the sequel, the stochastic 
convergences will be in terms of outer measure. 

We now make the following additional assumptions: 

(a) The censoring time C is independent of T° given {S, X, W) , of S given {X, W) , and is 
non-informative. With probability 1, P(C > > t\S, X,W) > cq for some positive 
constant cq. 
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(b) The parameter values Po and 70 lie in the interior of known compact sets B CM^ and 

Q GW^ respectively. For every fc g /C, the cumulative baseline intensity function Afc_o 
is a strictly increasing function on [0, r] with Afc_o(0) = and Ak^ir) < 00. Moreover, 
for every k £ IC, Akfi is continuously differentiable in [0, r], with Xk,o{t) ~ dAkfi{t)/dt. 
Let A denote the set of functions satisfying these properties. 

(c) The covariate vectors X and W are bounded {i.e. \\X\\ < c\ and < ci, for some 

finite positive constant ci, where || • || denotes the Euclidean norm). Moreover, the 



covariance matrices of X and W are positive definite. Let ci = min^gg 
and C3 = maxp^B,\\x\\<ci e 



(i'X 



(d) There is a constant C4 > such that for every k £ K,, P0g[l{S = k}Y{T)R] > C4, and 

the sample size n is large enough to ensure that X]"^! ^{^i = k}Yi{T)Ri > for every 
keJC. 

(e) With probability 1, there exists a positive constant C5 such that for every k £ JC, 

Pg,[Am{S^k}\T,X, W]>C5. 

(f) R is independent of S given W, of (T, A) given {X,S). The distribution of S con- 

ditionally on X and W does not involve the components of X that are not in W. 
The distributions of R and of the covariate vectors X and W do not depend on the 
parameter 9. 

Remark 1. Conditions (b), (c), (d), and (e) are used for identifiability of the parameters 
and consistency of the proposed estimators. Condition (d) essentially requires that for every 
stratum k, some subjects are known to belong to k and are still at risk when the study ends. 
The first assumption in condition (f) states that 5" is missing a t random, wh ich is a fairly 
general missing data situation (we refer to chapters 6 and 7 in Tsiatid ( 20061 ) for a recent 
exposition of missing data mechanisms). 

Remark 2. We are now in position to describe our proposed approach to the problem 
of estimation in model ([T]) from a sample of incomplete data Oi, i — 1, . . . , n. 
Let S denote the set of subjects w ith unknown stratum in th is sample. The regression 
calibration method investigated bv iDupuv and Lecontd ( 2008f l essentially allocates every 



subject of iS to each of the strata, and estimates /3o by maximizing a modified version of 
the partial likelihood for the stratified model, where the contribution of any individual i 
in S to the within-fc-th-stratum partial likelihood is weighted by an estimate of 7rfc_^(Wi) 
(for every k £ JC). The asymptotic bias of the resulting estimator arises from the failure of 
this method to fully exploit the information carried by (T^, A^, X^, Wi) on the unobserved 
stratum indicator Si. 

Therefore in this paper, we rather suggest to weight each subject i in 5 by an estimate 
of the conditional probability that subject i belongs to the A:-th stratum given the whole 
observed data (T^, A^, X^, Wi). This suggestion raises two main problems, as is described 
below. 

Remark 3. First, we should note that the suggested alternative weights depend on the 
unknown base hne intensity functions. Th erefore, the modified partial likelihood approach 
considered by IDupuv and Lecontd (|2008l) can not be used to derive an estimator for (3q. 



Next, the statistics to be involved in the score function for (3 will depend on the condi- 
tional weights and thus, this score will not be expressible as a stochastic integral of some 
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predictable process, as is often the case in models for failure time data. This, in turn, will 
prevent us from using the counting process martingale theory usually associated with the 
theoretical developments in failure time models. 

To overcome the first problem, we define our estimators from a full likelihood for the whole 
parameter, that is, for both the finite-dimensional -/3 ( and 7)- and infini te-dimensional - A;;, 
fc e /C- components of 9. Empirical process theory ( van der Vaart and Wellneil . 1 199^ is 



used to establish asymptotics for the proposed estimators. 



3. Maximum likelihood estimation 



In the sequel, we assume that there are no ties among the observed death times (this hy- 
pothesis is made to simplify notations, but the results below can be adapted to accomodate 
ties). The likelihood function for observed data Oi, i — 1, . . . , n is given by 



Ln{e) = n 



i=l 



K 

Jl [\u{Ti)^' exp (a,/3'X, - e^'^'Afc(r,)) '^kn{W^)] 



l{Si=fc} 



.fc=i 



K 



.k=l 



J2 Afc(T,)'^' exp - e^'^'^AkiTM ^fc,^(W,) 



(2) 



It would seem natural to derive a maximum likelihood estimator of Oq by maximizing the 
likelihood However, the maximum of this function over the parameter space Q = 

B X Q X A'^^ does not exist. To see this, consider functions with fixed values at the 
Ti, and let (5Afe(t)/5t)|f=T, = Afe(Ti) go to infinity for some Ti with AiR^l{S.i = fc} = 1 or 
A,(l - i?,) = 1. 

To overcome this problem, we introduce a modified maximization space for ([2]), by 
relaxing each Afe(-) to be an increasing right-continuous step-function on [0, r], with jumps at 
the Ti's such that AiRil{Si = fc} = 1 or Ai(l-i?i) = 1. Estimators of (/3o, 70, Afc,o; k G JC) 
will thus be derived by maximizing a modified version of ([2]), obtained by replacing Xk{Ti) 
in ^ with the jump size Ak{Ti} of A^ at Ti. 

If they exist, these estimators will be referred to as nonparametric maximum likelihood 
estimators - NPMLEs - (we refer to lZeng and Lin ( 2007 ) for a review of the general principle 
of NPML estimation, with application to various semiparametric regression models for 
censored data. See also the numerous references therein). In our setting, existence of such 
estimators is ensured by the following theorem (proof is given in Appendix): 

Theorem 3.1. Under conditions (a)-(f), the NPMLE On — {Pmln, Ak^n]k e JC) of Oq 
exists and is achieved. 

The problem of maximizing L„ over the approximating space described above reduces to a fi- 
nite d imensional problem, and the expectation-maximization (EM) algorithm ( Dempster et al.l 
1973) can be used to calculate the NPMLEs. For I < i < n and fc S /C, let Wi{k,0) 
be the conditional probability that the i-th individual belongs to the fc-th stratum given 
{Ti, Ai, Xi,Wi) and the parameter value 9, and let Q{Oi,k,9) denote the conditional ex- 
pectation of l}^^ = fc} given Oi and the parameter value 9. Then Q{Oi,k,9) has the 
form 



Q(0„ fc, 9) = Rd{S, = fc} + (1 - R,)w^{k, 9). 
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In the M-step of the EM-algorithm, we solve the complete-data score equation conditional 
on the observed data. In particular, the following expression for the NPMLE of Afe(-) can 
be obtained by: (a) taking the derivative with respect to the jump sizes of A/j(-), of the 
conditional expectation of the complete-data log-likelihood given the observed data and the 
NPML estimator, (b) setting this derivative equal to 0: 

Lemma 3.2. The NPMLE 0„ satisfies the following equation for every k € IC: 



t " 



■dN^{s), 0<t<T. 



The details of the calculations are omitted (note how the suggested weights Wi{k,9) nat- 
urally arise from th e M- step of the EM algorith m). W e refer the interested reader to 
IZeng and Cai ( 2005 ) and Sugimoto and Hamasakil ( 2006f ). who recently described EM al- 
gorithms for computing NPMLEs in various other semiparametric models with censored 
data. 

In the sequel, we shall denote the conditional expectation of the complete-data log- 
likelihood given the observed data and the NPML estimator by (0)]. 



4. Asymptotic properties 



This section states the asymptotic properties of the proposed estimators. We first obtain 
the following theorem, which states the strong consistency of the proposed NPMLE. The 
proof is given in Appendix. 

Theorem 4.1. Under conditions (a)-(f), ||/3„ — /3o||, ||7n~7oll7 '"^'^ supjgjg.r] |Afc,„(i) — 
Afc o(t)| (for every k Cz K.) converge to almost surely as n tends to infinity. 

To derive the asymptotic no rmality of the proposed estimators, we adapt the function 
analyt i c approach developed by Murphy llQQSh f or the frailty model (see also Chang et al. 
(l2005l) . [Kosorok and Song! (|2007ir and lLul (|2008f ). for recent examples of this approach in 
various other models). 

Instead of calculating score equations by differentiating En[ln{0)] with respect to /3, 7, 
and the jump sizes of Afc(-), we consider one-dimensional submodels 9n,ri passing through 
9n and we differentiate with respect to rj. Precisely, we consider submodels of the form 



(3n + rjhp.^n + rih^-, / {1 + ^/ZiAt (s)}dAfe,„(s); k e K 



where hp and h^ ~ i^'-yi^ ■ ■ ■ ' ''•-yjc-i)' P~ ^^'^ <?-dimensional vectors respectively {h-y. € 
M™, j = 1, . . . , K— 1), and the /iaj. {k E K.) are functions on [0, r]. Let h = [hp, h^, h^^,; k g 
/C). To obtain the score equations, we differentiate £'n['n(^n,r;)] with respect to rj and we 
evaluate at 77 = 0. 9n maximizes En[ln{9)] and therefore satisfies (9£^„[/„(0„^,,)]/9?7) = 

for every h, which leads to the score equation S'„(0„)(/i) = where S'„(6'„)(/i) takes the 
form 



Sn{9n){h)^¥n 



K 

h'pSp{9^) + h'^S^{9r:) + SAMihA,) 

k=l 



(3) 



7 



where 



K 



Sp{e) = AX - ^ Q{0, k, 0)X exp(/3'X)Afc(r), 



fe=i 



s^{e) = is^,iey,...,s^,_doyy with s^,{e) = w[Q{o,k,e)-nk,^iw)] 



SAMihA, ) = Q{0,k,9) /iA,(T)A-exp(/?'X) / hA,{s) dAk{s) 



Jo 



We take the space of elements h = {hp, h-y, Ka^ ; A; S /C) to be 



H = {{hj3,h^,hA^]k e K.) -.hpe W, \\his\\ <oo;h^ e M.'', \\h^\\ < oo; 

ft-A^ is a function defined on [0,t], ||ft.Afe|U < oo,k G /C}, 



where ||/iAfc||D denotes the total variation of /ia^ on [0,t]. We further take the functions 
/lAfe to be continuous from the right at 0. 

Define d{h) = h'l^fi + h'^'y + hA^is) dKk{s), where h G H. Prom this, the 

parameter 9 can be considered as a linear functional on H , and the parameter space Q can 
be viewed as a subset of the space 1°°{H) of bounded real- valued functions on ff, which we 
provide with the uniform norm. Moreover, the score operator Sn appears to be a random 
map from 8 to the space 1°°{H). Note that appropriate choices of h allow to extract 
all components of the original parameter 9. For example, letting = 0, /lAfc(-) = for 
every k G Kl, and be the p-dimensional vector with a one at the i-th location and zeros 
elsewhere yields the i-th component of /3. Letting hp = 0, hj = 0, hA^{-) = for every 
k £ fC except /ia^ (s) = l{s < t} (for some t G (0, r)) yields Aj{t). 

We need some further notations to state the asymptotic normality of the NPMLE of /3o- 
Let us first define the linear operator a = {a^, a^, a a,, ;k G K.) : H ^ H hy 



K 



Pe, 2xi\i,{o,eo)Y,Q{o,k,eo)hA,{T) 



+Pe, mo, eo)x eo)x'h0 + s^ieoYh^}] , 



K 



a^{h) = Pe, 2S^{9o)AY,Q{0,k,9o)hAdT) + Peo[S^{Oo) 3^(90)'] h. 



'7 



+Po„ mO,9o)S.,{9o)X'] hp, 
<^A, {h){u) = hA, {u)Pe, [Q{0, k, eo)<l>{u, O, k, e^)\ 





h'pPe^ [2XV(0, eo)Q{0, k, ^o)e'5°^F(w) 
h'^Pe, \2S^{9o)Q{0,k,eo)e'^'^''Y{u)\ , 
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where </>(u, O, k, Bo) = Y{u)Q{0, k, 0o)e''o^ and ^{O, 9o) = A^J^Li Qi.^^ k, 0o)e^o^ AkAT)- 
This operator is continuously invertible (Lemma 15.21 in Appendix) . We shall denote its in- 
verse by cr"^ = {a^^ ,a~^ ,a~^l]k S /C). 

Next, for every r £ N\{0}, the r-dimensional column vector having all its compo- 
nents equal to will be noted by 0^ (or by when no confusion may occur). Let h = 
(/i/3, hj, /lAfc! k £ K.) £ H. \i = Q and hj^^ is identically equal to for every fc € /C, we 
note h — (/i^,0, 0;fc € JC). Let : R-^ ^ M*' be the linear map defined by a^^(it) — 

(T^^(('u, 0, 0; fc e /C)), for u € W. Let {ei, . . . , Cp} be the canonical basis of W. 
Then the following result holds, its proof is given in Appendix. 

Theorem 4.2. Under conditions (a)-(J), y^{(3n — M has an asymptotic normal dis- 
tribution N{0,T,i3), where 

is the efficient variance in estimating Pq. 

Remark 4. Although 70 and the cumulative baseline intensity functions A^^o {k G JC) 
are not the primary parameters of interest, we may also state an asymptotic normality 
result for their NMPLEs. This requires some further notations. 

Define : -> R« by a-^{u) ^ a-^{{0,u,Q;k e /C)), let {/i,.. be the canonical 

basis of R'^, and define — (cr~^(/i), . . . , cr~^(/g)). Finally, let — {hp, hj,hA^; k £ IC) 
be such that hfj = 0, h^ ~ 0, /ia^ (•) = 1{- < t} for some t € (0, r) and j € JC, and /ia^ = 
for every k G JC,k ^ j. Then the following holds (a brief sketch of the proof is given in 
Appendix): 

Theorem 4.3. Assume that conditions (a)-(f) hold. Then v^(7n ~ 7o) has an asymp- 
totic normal distribution N{0, S^). Furthermore, for any t £ (0, r) and j £ JC, ^/n{Aj^n{t) — 
Aj,o(i)) asymptotically distributed as a N{0,v]{t)), where 

«|W = / crA^(^(j,t))(w)dAj-o(u). 
Jo 

We now turn to the issue of estimating the asymptotic variances of the estimators /3„, 
7„, and Aj.„(t) (t £ (0,t), j £ JC). It turns out that the asymptotic variances S^, S^, 
and v'j{t) are not expressible in explicit forms, since the inverse has no closed form. 
However, this is not a problem if we can provide consistent estimators for them. Such 
estimators are defined below. 

For i = 1, . . . ,n, let Xir denote the r-th (r = 1, . . . ,p) component of Xi, S.y^i{9) be 
defined as in ^ with O and W replaced by Oi and Wi respectively, and S^^i^s{0) be the 
s-th (s — l,...,q) component of Sry^i{9). Using these notations, we define the following 
block matrix 

/ At^f^ Af^^ Af^^ \ 
A„ = A-y^ A-y-y Ai^ (4) 
\ ^A7 ^aa j 



where the sub-matrices A^^ , A'^'^ , A^'^ , and A'^^ are defined as follows by their (r, s)-th 
component: 

1 " 

= n E{^(^- ^n)yXirXis, r,s = l,...,p, 
1 " 

■^rs ~ ~ ^ ^ S'Y,i,r{^n)S^^i,s{^n)j S = 1, . . . ,q, 

i=l 

1 " ^ 

^f7 = r = l,...,p, s = l,...,q, 



n-. 

2 — 1 

























i J 




i ^Ak7 J 





A7f=Af;, r = l,...,g, s = l,...,p. 

Define the block matrices A^^^ = (^^^Ai ^ _ _ ^ ^/3Ak j ^nd A^^^ = {A^i^^ . . . , A'^^'<), where 
for every k G )C, the sub-matrices A^^'' and are defined by 

A^^" = ^X,rA,ij{Os,en)Q{Os,k,en), r = l,...,p, s = l,...,n, 
= ^5^,,,,(?„)A,g(a,fc,^„), r = l,...,q, s = l,...,n. 
Define also the block matrices 

y^AiAi _ _ _ ^AiAjc 
^AftrAi _ _ _ ^AkAk 

where for every j,k G /C, 

1 " - - 

^rS^ = J2 2^-V'(0i, On)Q{Oi, k, 6n)e^-'''Yi{Tr), r = 1, . . . , n, s = 1, . . . 

i=i 

1 " ^ ^ 

= ~n E 2S^,i,s{0n)Q{Oi, k, 0n)ef^-'''Yi{Tr), r = l,...,n, s = l,...,q, 

1 " 

A^s'^' = Hj = k}l{r = s}- V Q{Oi, k, 0n)4>{Tr, Oi, k, 9n) 

n t— ' 

1=1 

/ 1 " 

> fc} l{r = s}- V 20(r„ Oi, A:, 9n)Q{Oi,j, 0„) 

V "lit 

2 " 
i=i 

-^<^(T^, a, k, eMOsJ, ^„)A,^ , r, s = 1, . . . , n, 

and AAj,n{Ts) is the jump size of Aj,„ at T, that is, AAj,a{Ts) = Aj,n{Ts) - Aj,„(Ts-) 
{j G /C, s = 1, . . . , n). Note that for notational simplicity, the lower (sample size) indice n 
has been omitted in the notations for the sub-matrices of A„. 
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Now, define 

S^_„ = {A^^P - A'^''{A-''')-^A-'^ - {A^^ - A'^''{A'''')-^A'-'^) 

x{A^^ - A^-'{A^-')-^A''^)~\A^'^ ~ A^'<{A^'')~^A-"^)y^ , 
S^,„ = {AT^ - A-^i^iA^fy^A^^ - (AT^ - A-^PiAf'f^y'AP^) 

and 

Then the following holds: 

Theorem 4.4. Under conditions (a)-(f), T,fj,n and !]-,..„ converge in probability to 
and respectively as n tends to oo. Moreover, for t £ (0, r) and j e /C, Zef 

«j>(0 = ^0",t)^^'"^("-,*)' 

w/iere 

= (oo-i)„,AA;;(Ti)i{Ti<n,-..,AA^(r„)i{r„<t},o'(^^^^^^ 

and 

Then Vj^{t) converges in probability to Vj{t) as n tends to oo. 



5. Discussion 



In this paper, we have constructed consistent and asymptotically normal estimators for the 
stratified proportional intensity regression model when the sample stratum information is 
only partially available. The proposed estimator for the regression parameter of interest in 
this model has been shown to be semiparametrically efficient. Although computationally 
more challenging, these estimators improve th e ones previously investigat ed in the literature, 
such as the regression calibration estimators ( Dupuv and Lecontel 20081) . 

We have obtained explicit (and computationally fairly simple) formulas for consistent 
estimators of the asymptotic variances. These formulas may however require the inver- 
sion of potentially large matrices. For a large sample, this inversion may be unstable. 
An alternative solution re lies on numerical differe ntiation of the profile log-likelihood (see 
Murphv et al.l ( 19971 ) and Chen and Little ( 1999[ ) for example). Note that in this latter 



method however, no estimator is available for the asymptotic variance of the cumulative 
baseline intensity estimator. Some further work is needed to evaluate the numerical per- 
formance of the proposed estimators. This is the subject for future research, and requires 
some extensive simulation work which falls beyon d the scope of this paper. 

In this paper, a multinomial logistic model (|jobsonL Il992f ) is used for modeling the 
conditional stratum probabilities given covariates. This choice was mainly motivated by 
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the fact that this model is commonly used in medical research for modeling the relationship 
between a categorical response and covariates. The theoretical results developed here can 
be extended to the case of other link functions. In addition, the covariate X in model ([!]) 
is assumed to be time independent, for convenience. This assumption can be relaxed to 
accomodate time varying covariates, provided that appropriate regularity conditions are 
made. 

Appendix A. Proofs of Theorems 

A.l Proof of Theorem [3TI] 



For every fc e /C, define IJ^ = {i e {1, . . .,n}\A,Rd{S^ = fc} = 1 or A,{1 - R,) = 1}, 
and let denote the cardinality of XJ!. Let = X]fcLi*fc- Consider the set of times 
{Ti,i S 2^}. Let i(/c,i) < . . . < i{k,ii) denote the ordered failure times in this set. For any 
given sample size n, the NPML estimation method consists in maximizing L„ in Q over 
the approximating parameter space 

e„ = { {13, 7, KkiHk.j)}) : /3 G 6; 7 e g; MHk.o)} € [0, oo), j = 1, . . . , z^, /c e /C} . 

Suppose first that hk{t(k.j)\ < < oo, for j = and fc e /C. Since L„ is a 

continuous function of /3,7, and the Afe{t(s. j)}'s on the compact set B x Q x [0, M]*», i„ 
achieves its maximum on this set. 

To show that a maximum of L„ exists on S x ^ x [0, oo)'« , we show that there exists a finite 
M such that for aU 0^^ = (/3^^ 7^, (Af {t(fcj)})j,fc) € x g x [0, oo)^')\{B x x [0, Mf-), 
there exists a. 9 = (/3,7, {!^k{t{k,3)})j,k) e B xQ x [0,MY' such that L„(6') > L„(6'*^). A 
proof by contradiction is adopted for that purpose. 

Assume that for all M < oo, there exists O'^ € {B x g x [0, oo)*' x g x [0, MY') such 
that for all 6* e S X X [0, M]"* , Ln{9) < Ln{0^). It can be seen that i„ is bounded above 

by 



K 

, Ai/?,il{S'i=fc} 



W {c3Afc{r,}}^'^'^^^'-"^exp -C2i?a{5, = k}Y,Kk{hk.,)}l{t(k.j) < 

k=l \ 3 = 1 



i=l 

If 6**^ e (Sx^x [0,oo)*')\(6x^x [0,M]*i^), then there exists / € /C andp e {1,. . such 
that A^^{t(; p)} > AI. By assumption (d), there exists at least one individual with indice 
(Uf e {1, • • •,"-}) such that l{Si^, = /} = 1, Yi^,{T) = 1 (and therefore <(;,p) < Ti^, = t), 
and i^^J^^ = 1. Hence 

»r 

It follows that the upper bound of L„(0*^) (and therefore Ln{0^^) itself) can be made as 
close to as desired by increasing M. This is the desired contradiction. 



□ 



A. 2 Proof of Theorem HI] 
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We adapt the techniques developed by iMurphvl (jl994l ) , in order to prove consistency of 
our proposed estimator 0„. The proof essentially consists of three steps: (i) for every 
fc G /C, we show that the sequence Afc^„(r) is almost surely bounded as n goes to infinity, 
(ii) we show that every subsequence of n contains a further subsequence along which the 
NPMLE 6n converges, (iii) we show that the limit of every convergent subsequence of 6'„ is 



Proof of (i). Note first that for aU s £ [0, r] and k e JC, ^ "£1=1 Q(P^^ ^„)e'5-^'r,(s) > 
C2^ X]r=i Ri^{Si = k}Yi{T). Moreover, Q{Oi, k, On) is bounded by 1. It follows that for all 
fc e /C, 



0< Afc,„(T) < - 



dNnis) 



i V" A- 



C2 Jo i E:Li RMS. - fc}r.(r) Er=i R^HS^ - fc}y,(r) ' 

where iV„(s) = n"^ J27=i ^ii^) ■ Next, -^^27=1 Ri^i^i — k}Yi{T) converges almost surely 
to Pgg[Rl{S ~ k}Y{T)] > C4 > therefore, for each fc e /C, as n goes to infinity, Afe^„(T) is 
bounded above almost surely by 



Proof of (ii). If (i) holds, by Helly's theorem (see Loevd ( 1963f) . pl79), every subsequence of 
n has a further subsequence along which Ai „ converges weakly to some nondecreasing right- 
continuous function A^, with probability 1. By successive extractions of sub-subsequences, 
we can further find a subsequence (say rij) such that A^, converges weakly to some non- 
decreasing right-continuous function A^, for every fc g /C, with probability 1. By the 
compactness of B x Q, we can further find a subsequence of nj (we shall still denote it by 
Hj for simplicity of notations) such that Ak^nj converges weakly to A^ (for every fc G /C) 
and (/3nj,7nj) converges to some (/3*,7*), with probability 1. We now show that the A^'s 
must be continuous on [0, r]. 

Let "0 be any nonnegative, bounded, continuous function. Then, for any given fc e /C, 



V^(s)dA*(.)= / i:{s)d{Al{s)-Ak,n,{s)} 
Jo 

-1-1 



-y^QiOi,kXy-^'''Yiis) 



"2 , 



1=1 



— y^Q{0,,kX,)dN,{s) 



< 



^(s)d{A^(s)-A,,„^.(s)}+ / ^(s) 



H -1 



^y^Rii{Si^k}Yiii 



dNn^is). 



By the Helly-Bray Lemma fsee iLoevd (|l963[ ). pl80), /J" V'(s) d{A^(s) - Afc,„^.(s)} 
as j 00. Moreover, Nuji') and -^^27=1 ^i^i^i = k}Yi{-) converge almost surely in 
supremum norm to 



K 

E 

fc=i 



Pff, 



1{S = fc}e^o^r(s) dAkflis) and Pg, [R1{S = k}Y{-)] 



respectively, where the latter term is bounded away from on s € [0, r] by a ssumption 
(d). Thus, by applying the extended version of the Helly-Bray Lemma (stated by lKorsholm 
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( 19981 ) for example) to the second term on the right-hand side of the previous inequahty, 
we get that 

Hs)dAl{s) (5) 

<C2 / i;{s){Pg„[Rl{S = k}Y{s)]}-'y2PeMS ^ k}e'''«''Y{s)]Xk,o{s)ds. 

k=i 

Suppose that A^. has discontinuities, and let ip be close to except at the jump points of A^., 
where it is allowed to have high and thin peaks. While the right-hand side of inequality ([5]) 
should be close to (A^^o is continuous by assumption (b)), its left-hand side can be made 
arbitrarily large, yielding a contradiction. Thus A^, must be continuous (fc G /C). A second 
conclusion, arising from Dini's theorem, is that Afc^„^. uniformly converges to A^ (fc € /C), 
with probability 1. To summarize: for any given subsequence of n, we have found a further 
subsequence Uj and an element (/3*, 7*, A^, /c S /C) such that ||/3„^ — /3*||, ||7„^. — 7*||, and 
supfg[o.r] \Ak,nj{t) ~ (for cvcry k £ K.) converge to almost surely. 

Proof of (in). To prove (iii), we first define random step functions 



Afc,„(t) 



It E;=i QiO,,k, 9o) exp(/3^X,)y,(s) 



dNi{s), < t < r, fc e /C, 



and we show that for every fc g /C, Afc.„ almost surely uniformly converges to A^.q on [0, t]. 
First, note that 



sup 

te[o,-r] 



Afc,„(0 - Pe„ 



< sup 

tG[0,r] 



Al{T<t}Q{O,k,0„) 
Po„ [l{S = k}e^'o^Y{s)] 

-y2^^HT^<t}Q{0,,k,eo) 

71 ^ ^ 



P„ [QiO, k, 0o)ef^oXYis)] Pe, [l{S = k}ef^o^Y{s)] 

Al{T<t}Q{0,k,eo) 



L = T, 



+ sup 

te[o,T] 



< sup 



Pe, [nS^kjeP'o^Yis)] |,^^ 
1 



P„ [Q{0, k, 0o)e'5o^y(s)] Pe, [l{S ^ kje^'o^Yis)] 

A1{T <t}Q{0,k,eo) 



+ sup 

te[o,T] 



Pg, [1{S = fc}e'3o^r(s)] 



(6) 



The class {Y{s) : s e [0, r]} is Donsker and Q{0, k, 9Q)e^o-^ is a bounded measurable fun c- 
tion, hence {Q[o,k,eo)e'^'«^Y{s) : s G [0,r]} is Donsker (Corollary 9.31. iKosoro^ (|2007l )). 
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and therefore Glivenko-Cantelli. Moreover, Pg„[Q{0 60)6^0^ Y {s)] = P0^,[P0„[1{S = 
k)\0]el^'»^Y{s)] = Peo[l{S = k}e^»^Y{s)]. Thus 



sup 

sG[0,T 



QiO,k,e„)e^'o''Y{s) 



Pa. 



1{S ^ k}e^'«^Y{s) 



converges to a.e. Next, P9o[l{S = k}e'^o^Y{s)] is larger than C2.Peo[HS = k}Y{T)] and 
thus, by assumption (d), PBg[l{S = k}e^'>^Y{s)] > 0. It foUows that the first term on the 
right-hand side of inequahty © converges to a.e.. Similar aguments show that the class 
{Al{r < t}Q{0,k,9o)/Pe„[l{S = k}e^'o^Y{s)] : t e [0,r]} is also a Glivenko-CanteUi 
class, and therefore Afe_„ almost surely uniformly converges to 



Al{T <t}Q{0,k,eo) 
Poo [l{S^k}e0o^Y{.s)] 1,^, 



Now, note that o(i) = f* '''o[HS=k}dN(s)] 



Afe,o(<) = 



Pe, [1{S ^ fc}Al{r < t}] 



which can be reexpressed as 



Al{T<t}Q{0,k,9o) 
Pe, [l{S = k}e0o^Y{s)] 



Thus Ak,n almost surely uniformly converges to A^-^o on [0, r ]. 



Next, using somewhat standard arguments (see iParnen ()1998l ) for example), we can 



show that < nj^ {log Lnj {On j ) — logL„j(0„^-)} converges to the negative KuUback-Leibler 
information Peg[log{Li{d*)/ Li{9q))]. Thus, the KuUback-Leibler information must be zero, 
and it follows that with probability 1, Li{9*) = Li{9a). The proof of consistency is 
completed if we show that this equality implies 9* = 9^. For that purpose, consider 
Li{9*) = Li{9o) under A = 1, i? = 1, and 1{S = k} = 1 (for each fc e /C in turn). 
Note that this is possible by assumption (c). This yields the following equation for almost 
all t e [0,r], < ci, ||w|| < ci: 



log 



Kit) 

Afe,o(0 



/3o)' X - A^(i)e^*'- + Afc,o(i)e^°" + log 



7rfc,o(w) 



= 0. 



This equation is analogous to equation (A. 2) in IChen and Little (|l999f ). The rest of the 
proof of identifiability thus proceeds along the same lines as the proof of Lemma A. 1.1 in 
Chen and Little (I1999I) . and is omitted. 

Hence, for any given subsequence of n, we have found a further subsequence rij such that 
WPn, - - 7o||, and sup^gjo ,^] \Ak,n,{t) - Afe,o(OI (for every k e K.) converge to 

almost surely, which implies that the sequence of NPMLE 9n converges almost surely to ^o- 



□ 



A.3 Proof of Theorem [472] 



The p roof of Theorem 14.21 uses similar arguments as the proof of Theorem 3 of iFang et al. 

so we only highlight the parts that are different. We need a few lemmas before 
presenting the proof. 
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Lemma 5.1. Let h e H. Then the following holds: Pgo ['5'i(^o)(^)] = PeoWpSpiOQ) + 



Proof. From the properties of the conditional expectation, we first note that 

K 



Peo[Sp{0o)] - Peo 



k=l 
K 



AX-J2 HS = k}X eMPoX)KoiT) 



k=l 

= PeAXMir)], 

where M{t) = N{t) — /q ^{'^ ~ k}e^'o^Y{u) dAkfi{u) is the counting process mar- 

tingale with respect to the filtration !Ft — <j{N{u),l{C < u},X,S,W : < u< t}. X is 
bounded and jTt-measurable, hence it follows that Pg^ [S/3{9o)] — 0. Using similar arguments, 
we can verify that Pe„ [^a^ (^o)(^Afc )] = 0, k e K.. Finally, for A; = 1, . . . , if — 1, 

PeoiS^AOo)] = PeAW[QiO,k,eo)-nk,,,iW)]] 

= Pe„ [WPe„ [1{S = k} - nk,,,{W)\W]] 
= 0. 

Combining these results yields that Pg^ [Si{9Q){h)] = 0. 
□ 



We now come to the continuous invertibility of the continuous linear operator a defined 
in Section m 

Lemma 5.2. The operator a is continuously invertible. 

Proof. Since H is a Banach space, to prove that a is continuously invertible, it is suffi- 
cient to prove that a is one-to-one and that it can be written as the sum of a bounded linear 
opera tor with a bounded inverse and a compact operator (Lemma 25.93 of van der VaartI 

(mi))- 



Define the hnear operator A{h) — [hp, h^,Pe„ [\{S = k}4>{-,0, k, 6o)] /lAt (•); k G /C), this 
is a bounded operator due to the boundedness of X. Moreover, for all u g [0, t] and fc S /C, 
Pog [^{S = k}(j>{u, O, fc, ^o)] > C2C4 > by assumptions (c) and (d). This implies that A is 
invertible with bounded inverse A'~^{h) — {hp, h^, Pg^ [1{S — k}(j){-,0, k, 6o)]~ h\^{-); k G 
IC) . The operator a — A can be shown to be compact by using the same techniques as in 
Lul (|2008l ) for example. 

To prove that cr is one-to-one, let /i G fl" such that (t(/i) = 0. If a{h) = 0, Pg^ [Si{6o){h)^'\ = 
0, and therefore Si{9o){h) = almost surely. Let j G JC. By assumption (e), for almost 
every t e [0,r], ||a;|| < ci, and \\w\\ < a, there is a non-negligible set Qt.x,w '~= ^ such that 
A(a;) = 1, R{uj) — 1, and l{S{u!) = j} — I when uj € i^t,x,w- If Si{6a){h) — almost surely, 
then in particular, for almost every t e [0,r], < ci, and |jw|| < ci, Si{9o){h) — when 
^ G ^t.x.wi which yields the following equation: 



hKj (t) + h'px -I- w'h~f^ - ^ w'h^^7Tk,jg{w) 



k=l 



ft-A^ (s) dAj^o{s) + h'fjxKjfi{t) 



= 0, (7) 
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with hj. — when j = K. Then, by choosing t arbitrarily close to 0, and since Aj_o is 
continuous, Aj,o(0) = 0, and is continuous from the right at 0, we get that 

A'-l 

/lAj(O) + h'px + w'h~f^ — ^ w'h^^TTh^^g{w) = 0. (8) 



Taking the difference d?])-® yields that 

for almost every t e [0, r] and ||x|| < ci. Since Aj Q is increasing (by assumption (b)), for 
every t > 0, Ajfi{t) > Aj_o(0) = and therefore ^ can be rewritten as 



(9) 



Aj,o(i) 



e''"" [r{t) + h'px] 



(10) 



where r{t) = h^. {s) dAj^Q{s) / Aj^Q{t) . Consider first the case where /3o = 0. Since the 
left-hand side of ^TU\i does not depend on x, hp must equal 0. Next, consider the case where 
/3o ^ 0. Let ti,t2 > 0. Then e'^'o'=[r{ti) - r{t2)] does not depend on X. Since the covariance 
matrix of X is positive definite, we can find two distinct values xi and X2 of X such that 
eP'o^^[r(ti) — r{t2)] = e^'o'-^^[r{ti) — r{t2)]- This implies that r{ti) = r{t2), from which we 
deduce that /iAj(0 has to be constant (say, equal to a) for almost every t £ (0,t]. From 
(jlOp . we then deduce that /ia.(O) = a, which further implies that hp = 0, a — 0, and thus 



hAj (t) — for almost every t € [0, r] (j € /C). This, together with ([8]) imphes that h-y. = 0, 

Let k ^ K. Then aA„ih)iu) = Pe^ [1{S = K}(t){u,0,K,9o)] hA^{u) = for all u e [0,t] 
since hp ~ and /i^ = 0. By assumptions (c) and (d), for every u e [0,t] and fc G /C, 



Peo = k}(f>{u, O, k, 0o)] = Pe„ [l{S = k}Y{u)Q{0, k, 9o)e^oX 

> Peo = k}Y{T)Re^'o^] > 0, 



hence we conclude that /ia^^ is identically equal to on [0, r]. Next, considering ctajt-i (^)('") = 
with /i/3 = 0, /I'y = and hA^ — 0, we conclude similarly that /ia^_i(u) = for every 
u £ [0, r]. It follows that /ia^ is identically equal to on [0, r] for every j £ JC. Therefore, 
a is one-to-one. 

□ 



We now turn to the proof of Theorem 14.21 itself. Similar to iFang et al. we get 

that 



h'piP^ - Po) + Kiln ~ 70) + 



/ hA,{s) d(Ak,n - Ak,o){s) 



^V^{Snieo)ia-\h))-Pg, [Siieo){a-\h))])+Opil), 
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where Sn is given by Consider the subset {(/1/3, 0, 0; fc € /C)|/i/3 G W} C H and let h be 
an element of this subset. Setting h ^ h in the above equation yields 

V^h'piX- f3o)^V^{Sn{0o)(<T-\h))-Pe„ [Si{eo){<J-\h))\) +o.p{l). (11) 

By Lemma [5TT1 the central limit theorem, and Slutsky's theorem, ^Jnh'p{l3n — Po) is asymp- 
totically normal with mean and variance Peo[SiiOo)ia-^{h)f]. lih e H, direct calculation 
yields 

Si{9o){hf = h'pS0{do)Sp{9oyhp + h'^S^{9o)S^{9oyh^ + 2h'pSp{9o)S^{9oyh^ 



\k=l 



A,) j 
H \ 2 



/iA,(r)A-exp(/3^X) / hK,{s)dKkfl{s) 



/iA,(r)A-exp(/3^X) / /iA,(s)dAfc,o(s) 



[q{0,K9^) 

k=l \ 
k=lj>k \ 

X (g(o,j,0o) 



Taking expectation followed by some tedious algebraic manipulations and re-arrangement 
of terms yield that 



/iA^(T)A-exp(/3^X) / /^A,(s)dA,-o(s) 



k=l 



Therefore 



Siieo){a-\h)f = a-\hyap{a-\h))+a-\hya^{a-\h)) 



where the last equality comes from the fact that 

cj{a-\h)) = {<jp{a-\h)),a^{a-\h)),aA,{<J'\h)y,k e JC) =h. 

Now, recall that the linear map : RP -> RP was defined in Section |4] as a restricted 
version of cr^"'^, by setting ap^{hfj) ~ aj^^iji) for any h of the form (ft,^,0,0;fc € /C). Let 
{ei, . . . , gp} be the canonical basis of W and S/3 = (aj^(ei), . . . , (T^^(ep)). Then for any 
hfi € W, we have a'^'^{hp) — S/3/1/3 and thus Pgg[Si{9o){<7^^ (h))'^] — h'pTjphi3. Hence, for 
every hp e R^, y/nh'fj{(3n — Po) converges in distribution to A/'(0, h'ijT,php). By the Cramer- 
Wold device, \/n{(3n — Po) converges in distribution to A/'(0, E/3). 
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Now, for j — I . . . ,p, denote hj — (e^, 0, 0; fc G /C). Letting h = hj for each j — 1 ... ,p in 
turn in (fTTj) yields 



'o) = ^V//3(0.,eo) + Op(l), 



where 



K 



k=l 



S and S* are {p x q) and (p x I) matrices respectively defined by 




and ^* 




and Sa^{6()) is applied componentwise to S*. Thus /3„ is an asymptotically linear estimator 
for (3o, and its influence function lfj{0,9o) belongs to the tangent space spanned by the 
score functions. It follows that li3{0,9o) is the efficient influence function for /3o, and that 
Pn is semiparametrically efficient (see iBickel et al. ( 1993[) or Tsiatid ( 20061 )). 



□ 



A.4 Proof of Theorem liTSl 



The proof of asymptotic normality of \Ai(7n ~ 7o) proceeds along the same line as for 
Vn{Pn — Po), and is therefore omitted. 

Next, for any t E (0,r) and j G K., the asymptotic normality of ^/n{Kj_n{t) — ^j.o{t)) can 
be proved by using a similar argument with h replaced by /i(j,t) = (hfj, h-y, h^^; k € /C), 
where hp — 0, = 0, h\.{-) — 1{- < t} {t £ (0,r) and j £ K.), and — for every 
k £ K,, k ^ j . Details are omitted. 



□ 



A. 5 Proof of Theorem 14.41 



The proof of Theorem 14.41 parallels the proof of Theorem 3 in ^ Parner (|l998l) and thus. 



will be kept brief. Let (?„ = (ct/3.„, CT7,ti, $Afc,n; k £ IC) he defined as a with all of the and 



Psg replaced by 9n and P„ respectively. Similar to the proof of Theorem 3 in lParnen ()1998l ). 
it can be shown that (?„ converges in probability to a uniformly over H and that its inverse 
= {0"^]^,^^^,^^^ ^; k £ K.) is such that d~^{h) converges to cr~^(ft.) in probability. 

For every hp, the asymptotic variance oi \/nh'p((3n— Po) is h'pa^^{{hfj, 0, 0; fc e K)), which 
is consistently estimated by /i^ct^^((/i/3, 0, 0; fc G /C)). Let ft,„ = [hp^n, h^^n, ^A^,™; k £ K.) ~ 
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CT-i((/i/3,0,0;fc G /C)). Then ct„(/i„) = {hp,0,0;ke JC), or 

'5f3.n{hn) — hf3 

^^,n(hn) = (12) 

5^Afc,n(/in)(M) = 0, fc e /C, ue[0, r]. 
In particular, letting u = Ti, . . . , T„ in yields the following system of equations: 

(13) 







I hp 















where /ia,™ = (/iai,„(Ti), . . . , /iAi,„(r„), . . . , /iAK,n(T'i)7 • • • ^ ^A7f,n(T„))', and A„ is defined 
by ([4]). Some simple algebra on (|13p yields that = '^/3,nhp where S/3,n is defined in 
Sectional and therefore h'^Yip^nhp is a consistent estimator of the asymptotic variance of 
\/nh'p((3n — Po) for every hp. We conclude that S/3,„ converges in probability to S^. The 
consistency of proceeds along the same lines and is therefore omitted. 

We now turn to the estimation of the asymptotic variance of Aj „(<), for t e (0,t) and 
j e /C. By the dominated convergence theorem and the consistency of 

rt ^ 



is the element 



converges to w|(i) = /o ""a (^(j.t))('") '^^j.o('")i where we recall that /i(j.t) 
{hp,hj,hAk;k e K.) such that hp = 0, hj = 0, /iAj(-) = 1{- < t} for some t e (0,t) and 
j G /C, and /lAfc = for every fc G /C, fc 7^ j. Letting ft,„ = {hp.n,h^,n,h\^ n'jk G /C) = 
f^rT^^iO,*))! we get that dn{hn) = /i(j,t) or: 

<^j,n{hn) = ^^^^ 
CTAj,n(ft-«)(u) = l{w < MG[0,t] 

CTAfe,n(/in)(M) = 0, k e IC, k^j, MG[0, r]. 
In particular, letting u = Ti, . . . ,Tn in (|14p yields the system of equations 





hp,n ^ 












\ /lA,n / 


;( 1., ) 


with the notations h\_n — (^A 


.,n(ri),. 


■ I ^Ai,n(7n), • • 



(Ti), . . . , hAK,n{T„)y and 
^(It) = ('^O-i)"' < 0, • ■ ■ , l{7^n < 0, 0(K-j>)'- Similar algebra as above yields 

hA,n = ^A.nU^jf.-^, 

where T,A,n is defined in Section [H Now, J^a'^^ ,^{h(^jj.-j){u) dAj,n{u) verifies 

K],n{hu,t)){u)dl,,r.{u) = ^a^;,„(/i(,-,))(T,)AA;;(T,)l{r, < 



2^1 
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where = (o'(^_i)„, AA;;(Ti)1{Ti <t},..., AA;;(T„)1{T„ < t}, 0'(^_,.)„)'. It follows 
that S^. ()SA,nC^(" is a consistent estimator of Vj{t), which concludes the proof. 

□ 
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